Recording of Czech and Slovak Telephone Databases within SpeechDat-E

نویسندگان

  • Jan Cernocký
  • Petr Pollák
  • Milan Rusko
  • Václav Hanzl
  • Marián Trnka
چکیده

The databases of 5 East-European languages: Czech, Slovak, Russian, Polish and Hungarian are being created within the SpeechDat-E project. This paper describes the overall design of SpeechDat-E databases and concentrates on the Czech (1000 speakers) and Slovak (1000 speakers). The item structure and recording speci cations are presented. More detailed description is included for the language-speci c items. Attention is paid also to the geographic and dialect distribution of speakers. The paper also presents the recruitment strategy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SpeechDat(E) - Eastern European Telephone Speech Databases

This paper describes the creation of five new telephony speech databases for Central and Eastern European languages within the SpeechDat(E) project. The 5 languages concerned are Czech, Polish, Slovak, Hungarian, and Russian. The databases follow SpeechDat-II specifications with some language specific adaptation. The present paper describes the differences between SpeechDat(E) and earlier Speec...

متن کامل

Speechdat-e: five eastern european speech databases for voice-operated teleservices completed

In the Speechdat-E project five medium large telephone speech databases have been collected for Czech, Hungarian, Polish, Russian, and Slovak. The project was recently concluded. This paper reports briefly on the contents of the databases, elaborates on experiences gained from the data recordings and from the validation of the databases. The availability of the databases to the public is addres...

متن کامل

Crosslingual and bilingual speech recognition with Slovak and Czech speechdat-e databases

This paper presents the work on crosslingual and bilingual speech recognition carried out with SpeechDat databases for Czech and Slovak language. The work follows the MASPER initiative that was formed as a part of the COST 278 Action. In crosslingual experiments the expert-driven and the datadriven approaches were used for transferring monolingual source acoustic models to a target language. Th...

متن کامل

Comparison of Slovak and Czech speech recognition based on grapheme and phoneme acoustic models

Grapheme based mono-, crossand bilingual speech recognition of Czech and Slovak is presented in the paper. The training and testing procedures follow the MASPER initiative that was formed as a part of the COST 278 Action. All experiments were performed using Czech and Slovak SpeechDat-E databases. Grapheme-based models gave equivalent recognition performance compared to phoneme-based models in ...

متن کامل

SpeechDat Cymru: A Large-scale Welsh Telephony Database

We describe the collection of SpeechDat Cymru, a 2000-speaker speech recognition database for the Welsh language, recorded over the public switched telephone network (PSTN). It is collected as part of SpeechDat(II), an ELRA project which deals with the creation of databases in over 20 different European languages and dialects. Design issues common to all SpeechDat(II) databases are discussed, i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999